INESC-ID: A Regression Model for Large Scale Twitter Sentiment Lexicon Induction
نویسندگان
چکیده
We present the approach followed by INESCID in the SemEval 2015 Twitter Sentiment Analysis challenge, subtask E. The goal was to determine the strength of the association of Twitter terms with positive sentiment. Using two labeled lexicons, we trained a regression model to predict the sentiment polarity and intensity of words and phrases. Terms were represented as word embeddings induced in an unsupervised fashion from a corpus of tweets. Our system attained the top ranking submission, attesting the general adequacy of the proposed approach.
منابع مشابه
INESC-ID: Sentiment Analysis without Hand-Coded Features or Linguistic Resources using Embedding Subspaces
We present the INESC-ID system for the message polarity classification task of SemEval 2015. The proposed system does not make use of any hand-coded features or linguistic resources. It relies on projecting pre-trained structured skip-gram word embeddings into a small subspace. The word embeddings can be obtained from large amounts of Twitter data in unsupervised form. The sentiment analysis su...
متن کاملBuilding Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach
In this paper, we propose to build large-scale sentiment lexicon from Twitter with a representation learning approach. We cast sentiment lexicon learning as a phrase-level sentiment classification task. The challenges are developing effective feature representation of phrases and obtaining training data with minor manual annotations for building the sentiment classifier. Specifically, we develo...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملINESC-ID at SemEval-2016 Task 4-A: Reducing the Problem of Out-of-Embedding Words
We present the INESC-ID system for the 2016 edition of SemEval Twitter Sentiment Analysis shared task (subtask 4-A). The system was based on the Non-Linear Sub-space Embedding (NLSE) model developed for last year’s competition. This model trains a projection of pre-trained embeddings into a small subspace using the supervised data available. Despite its simplicity, the system attained performan...
متن کاملSAIL: A hybrid approach to sentiment analysis
This paper describes our submission for SemEval2013 Task 2: Sentiment Analysis in Twitter. For the limited data condition we use a lexicon-based model. The model uses an affective lexicon automatically generated from a very large corpus of raw web data. Statistics are calculated over the word and bigram affective ratings and used as features of a Naive Bayes tree model. For the unconstrained da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015